Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
1.
Nature ; 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38570684

RESUMO

Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.

2.
bioRxiv ; 2024 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-38654825

RESUMO

TBC1D3 is a primate-specific gene family that has expanded in the human lineage and has been implicated in neuronal progenitor proliferation and expansion of the frontal cortex. The gene family and its expression have been challenging to investigate because it is embedded in high-identity and highly variable segmental duplications. We sequenced and assembled the gene family using long-read sequencing data from 34 humans and 11 nonhuman primate species. Our analysis shows that this particular gene family has independently duplicated in at least five primate lineages, and the duplicated loci are enriched at sites of large-scale chromosomal rearrangements on chromosome 17. We find that most humans vary along two TBC1D3 clusters where human haplotypes are highly variable in copy number, differing by as many as 20 copies, and structure (structural heterozygosity 90%). We also show evidence of positive selection, as well as a significant change in the predicted human TBC1D3 protein sequence. Lastly, we find that, despite multiple duplications, human TBC1D3 expression is limited to a subset of copies and, most notably, from a single paralog group: TBC1D3-CDKL . These observations may help explain why a gene potentially important in cortical development can be so variable in the human population.

3.
Cell ; 187(6): 1547-1562.e13, 2024 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-38428424

RESUMO

We sequenced and assembled using multiple long-read sequencing technologies the genomes of chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, owl monkey, and marmoset. We identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. We estimate that 819.47 Mbp or ∼27% of the genome has been affected by SVs across primate evolution. We identify 1,607 structurally divergent regions wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (e.g., CARD, C4, and OLAH gene families) and additional lineage-specific genes are generated (e.g., CKAP2, VPS36, ACBD7, and NEK5 paralogs), becoming targets of rapid chromosomal diversification and positive selection (e.g., RGPD gene family). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species.


Assuntos
Genoma , Primatas , Animais , Humanos , Sequência de Bases , Primatas/classificação , Primatas/genética , Evolução Biológica , Análise de Sequência de DNA , Variação Estrutural do Genoma
4.
medRxiv ; 2024 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-38496498

RESUMO

Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

5.
bioRxiv ; 2024 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-38464314

RESUMO

Down syndrome is the most common form of human intellectual disability caused by precocious segregation and nondisjunction of chromosome 21. Differences in centromere structure have been hypothesized to play a potential role in this process in addition to the well-established risk of advancing maternal age. Using long-read sequencing, we completely sequenced and assembled the centromeres from a parent-child trio where Trisomy 21 arose in the child as a result of a meiosis I error. The proband carries three distinct chromosome 21 centromere haplotypes that vary by 11-fold in length--both the largest (H1) and smallest (H2) originating from the mother. The longest H1 allele harbors a less clearly defined centromere dip region (CDR) as defined by CpG methylation and a significantly reduced signal by CENP-A chromatin immunoprecipitation sequencing when compared to H2 or paternal H3 centromeres. These epigenetic signatures suggest less competent kinetochore attachment for the maternally transmitted H1. Analysis of H1 in the mother indicates that the reduced CENP-A ChIP-seq signal, but not the CDR profile, pre-existed the meiotic nondisjunction event. A comparison of the three proband centromeres to a population sampling of 35 completely sequenced chromosome 21 centromeres shows that H2 is the smallest centromere sequenced to date and all three haplotypes (H1-H3) share a common origin of ~15 thousand years ago. These results suggest that recent asymmetry in size and epigenetic differences of chromosome 21 centromeres may contribute to nondisjunction risk.

6.
Nature ; 621(7978): 355-364, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37612510

RESUMO

The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.


Assuntos
Cromossomos Humanos Y , Evolução Molecular , Humanos , Masculino , Cromossomos Humanos Y/genética , Genoma Humano/genética , Genômica , Taxa de Mutação , Fenótipo , Eucromatina/genética , Pseudogenes , Variação Genética/genética , Cromossomos Humanos X/genética , Regiões Pseudoautossômicas/genética
7.
bioRxiv ; 2023 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-37398417

RESUMO

We completely sequenced and assembled all centromeres from a second human genome and used two reference sets to benchmark genetic, epigenetic, and evolutionary variation within centromeres from a diversity panel of humans and apes. We find that centromere single-nucleotide variation can increase by up to 4.1-fold relative to other genomic regions, with the caveat that up to 45.8% of centromeric sequence, on average, cannot be reliably aligned with current methods due to the emergence of new α-satellite higher-order repeat (HOR) structures and two to threefold differences in the length of the centromeres. The extent to which this occurs differs depending on the chromosome and haplotype. Comparing the two sets of complete human centromeres, we find that eight harbor distinctly different α-satellite HOR array structures and four contain novel α-satellite HOR variants in high abundance. DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by at least 500 kbp-a property not readily associated with novel α-satellite HORs. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan, and macaque genomes. Comparative analyses reveal nearly complete turnover of α-satellite HORs, but with idiosyncratic changes in structure characteristic to each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the p- and q-arms of human chromosomes and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.

8.
Nature ; 617(7960): 325-334, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37165237

RESUMO

Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.


Assuntos
Conversão Gênica , Mutação , Duplicações Segmentares Genômicas , Humanos , Conversão Gênica/genética , Genoma Humano/genética , Polimorfismo de Nucleotídeo Único/genética , Haplótipos/genética , Éxons/genética , Citosina/química , Guanina/química , Ilhas de CpG/genética
9.
bioRxiv ; 2023 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-37205567

RESUMO

Advances in long-read sequencing (LRS) technology continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant calling precision and recall of Oxford Nanopore Technologies (ONT) and PacBio HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant calling precision and recall of SVs and indels in HiFi datasets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant callsets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.

11.
bioRxiv ; 2023 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-36945442

RESUMO

To better understand the pattern of primate genome structural variation, we sequenced and assembled using multiple long-read sequencing technologies the genomes of eight nonhuman primate species, including New World monkeys (owl monkey and marmoset), Old World monkey (macaque), Asian apes (orangutan and gibbon), and African ape lineages (gorilla, bonobo, and chimpanzee). Compared to the human genome, we identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. Across 50 million years of primate evolution, we estimate that 819.47 Mbp or ~27% of the genome has been affected by SVs based on analysis of these primate lineages. We identify 1,607 structurally divergent regions (SDRs) wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (CARDs, ABCD7, OLAH) and new lineage-specific genes are generated (e.g., CKAP2, NEK5) and have become targets of rapid chromosomal diversification and positive selection (e.g., RGPDs). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species for the first time.

12.
Mol Psychiatry ; 28(2): 822-833, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36266569

RESUMO

Autism Spectrum Disorder (ASD) diagnosis remains behavior-based and the median age of diagnosis is ~52 months, nearly 5 years after its first-trimester origin. Accurate and clinically-translatable early-age diagnostics do not exist due to ASD genetic and clinical heterogeneity. Here we collected clinical, diagnostic, and leukocyte RNA data from 240 ASD and typically developing (TD) toddlers (175 toddlers for training and 65 for test). To identify gene expression ASD diagnostic classifiers, we developed 42,840 models composed of 3570 gene expression feature selection sets and 12 classification methods. We found that 742 models had AUC-ROC ≥ 0.8 on both Training and Test sets. Weighted Bayesian model averaging of these 742 models yielded an ensemble classifier model with accurate performance in Training and Test gene expression datasets with ASD diagnostic classification AUC-ROC scores of 85-89% and AUC-PR scores of 84-92%. ASD toddlers with ensemble scores above and below the overall ASD ensemble mean of 0.723 (on a scale of 0 to 1) had similar diagnostic and psychometric scores, but those below this ASD ensemble mean had more prenatal risk events than TD toddlers. Ensemble model feature genes were involved in cell cycle, inflammation/immune response, transcriptional gene regulation, cytokine response, and PI3K-AKT, RAS and Wnt signaling pathways. We additionally collected targeted DNA sequencing smMIPs data on a subset of ASD risk genes from 217 of the 240 ASD and TD toddlers. This DNA sequencing found about the same percentage of SFARI Level 1 and 2 ASD risk gene mutations in TD (12 of 105) as in ASD (13 of 112) toddlers, and classification based only on the presence of mutation in these risk genes performed at a chance level of 49%. By contrast, the leukocyte ensemble gene expression classifier correctly diagnostically classified 88% of TD and ASD toddlers with ASD risk gene mutations. Our ensemble ASD gene expression classifier is diagnostically predictive and replicable across different toddler ages, races, and ethnicities; out-performs a risk gene mutation classifier; and has potential for clinical translation.


Assuntos
Transtorno do Espectro Autista , Humanos , Pré-Escolar , Lactente , Transtorno do Espectro Autista/diagnóstico , Transtorno do Espectro Autista/genética , Teorema de Bayes , Fosfatidilinositol 3-Quinases , Imunidade , Expressão Gênica
13.
Genome Res ; 33(12): 2029-2040, 2023 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-38190646

RESUMO

Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.


Assuntos
Genômica , Nanoporos , Mutação INDEL , Sequenciamento Completo do Genoma
14.
J Clin Invest ; 132(19)2022 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-35917186

RESUMO

Autism spectrum disorder (ASD) represents a group of neurodevelopmental phenotypes with a strong genetic component. An excess of likely gene-disruptive (LGD) mutations in GIGYF1 was implicated in ASD. Here, we report that GIGYF1 is the second-most mutated gene among known ASD high-confidence risk genes. We investigated the inheritance of 46 GIGYF1 LGD variants, including the highly recurrent mutation c.333del:p.L111Rfs*234. Inherited GIGYF1 heterozygous LGD variants were 1.8 times more common than de novo mutations. Among individuals with ASD, cognitive impairments were less likely in those with GIGYF1 LGD variants relative to those with other high-confidence gene mutations. Using a Gigyf1 conditional KO mouse model, we showed that haploinsufficiency in the developing brain led to social impairments without significant cognitive impairments. In contrast, homozygous mice showed more severe social disability as well as cognitive impairments. Gigyf1 deficiency in mice led to a reduction in the number of upper-layer cortical neurons, accompanied by a decrease in proliferation and increase in differentiation of neural progenitor cells. We showed that GIGYF1 regulated the recycling of IGF-1R to the cell surface. KO of GIGYF1 led to a decreased level of IGF-1R on the cell surface, disrupting the IGF-1R/ERK signaling pathway. In summary, our findings show that GIGYF1 is a regulator of IGF-1R recycling. Haploinsufficiency of GIGYF1 was associated with autistic behavior, likely through interference with IGF-1R/ERK signaling pathway.


Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Animais , Transtorno do Espectro Autista/genética , Transtorno do Espectro Autista/metabolismo , Transtorno Autístico/genética , Transtorno Autístico/metabolismo , Camundongos , Neurônios/metabolismo , Fenótipo , Transdução de Sinais
15.
Sci Adv ; 8(33): eabo7112, 2022 08 19.
Artigo em Inglês | MEDLINE | ID: mdl-35977029

RESUMO

Stress granules (SGs) are cytoplasmic assemblies in response to a variety of stressors. We report a new neurodevelopmental disorder (NDD) with common features of language problems, intellectual disability, and behavioral issues caused by de novo likely gene-disruptive variants in UBAP2L, which encodes an essential regulator of SG assembly. Ubap2l haploinsufficiency in mouse led to social and cognitive impairments accompanied by disrupted neurogenesis and reduced SG formation during early brain development. On the basis of data from 40,853 individuals with NDDs, we report a nominally significant excess of de novo variants within 29 genes that are not implicated in NDDs, including 3 essential genes (G3BP1, G3BP2, and UBAP2L) in the core SG interaction network. We validated that NDD-related de novo variants in newly implicated and known NDD genes, such as CAPRIN1, disrupt the interaction of the core SG network and interfere with SG formation. Together, our findings suggest the common SG pathology in NDDs.


Assuntos
DNA Helicases , Transtornos do Neurodesenvolvimento , Animais , Camundongos , Transtornos do Neurodesenvolvimento/genética , Proteínas de Ligação a Poli-ADP-Ribose/genética , RNA Helicases/genética , Proteínas com Motivo de Reconhecimento de RNA , Grânulos de Estresse
16.
NPJ Genom Med ; 7(1): 38, 2022 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-35715439

RESUMO

Recurrent copy-number variations (CNVs) at chromosome 16p11.2 are associated with neurodevelopmental diseases, skeletal system abnormalities, anemia, and genitourinary defects. Among the 40 protein-coding genes encompassed within the rearrangement, some have roles in leukocyte biology and immunodeficiency, like SPN and CORO1A. We therefore investigated leukocyte differential counts and disease in 16p11.2 CNV carriers. In our clinically-recruited cohort, we identified three deletion carriers from two families (out of 32 families assessed) with neutropenia and lymphopenia. They had no deleterious single-nucleotide or indel variant in known cytopenia genes, suggesting a possible causative role of the deletion. Noticeably, all three individuals had the lowest copy number of the human-specific BOLA2 duplicon (copy-number range: 3-8). Consistent with the lymphopenia and in contrast with the neutropenia associations, adult deletion carriers from UK biobank (n = 74) showed lower lymphocyte (Padj = 0.04) and increased neutrophil (Padj = 8.31e-05) counts. Mendelian randomization studies pinpointed to reduced CORO1A, KIF22, and BOLA2-SMG1P6 expressions being causative for the lower lymphocyte counts. In conclusion, our data suggest that 16p11.2 deletion, and possibly also the lowest dosage of the BOLA2 duplicon, are associated with low lymphocyte counts. There is a trend between 16p11.2 deletion with lower copy-number of the BOLA2 duplicon and higher susceptibility to moderate neutropenia. Higher numbers of cases are warranted to confirm the association with neutropenia and to resolve the involvement of the deletion coupled with deleterious variants in other genes and/or with the structure and copy number of segments in the CNV breakpoint regions.

17.
Science ; 376(6588): eabj6965, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35357917

RESUMO

Despite their importance in disease and evolution, highly identical segmental duplications (SDs) are among the last regions of the human reference genome (GRCh38) to be fully sequenced. Using a complete telomere-to-telomere human genome (T2T-CHM13), we present a comprehensive view of human SD organization. SDs account for nearly one-third of the additional sequence, increasing the genome-wide estimate from 5.4 to 7.0% [218 million base pairs (Mbp)]. An analysis of 268 human genomes shows that 91% of the previously unresolved T2T-CHM13 SD sequence (68.3 Mbp) better represents human copy number variation. Comparing long-read assemblies from human (n = 12) and nonhuman primate (n = 5) genomes, we systematically reconstruct the evolution and structural haplotype diversity of biomedically relevant and duplicated genes. This analysis reveals patterns of structural heterozygosity and evolutionary differences in SD organization between humans and other primates.


Assuntos
Variações do Número de Cópias de DNA , Duplicação Gênica , Genoma Humano , Duplicações Segmentares Genômicas , Evolução Molecular , Proteínas Ativadoras de GTPase/genética , Humanos , Polimorfismo de Nucleotídeo Único , Proteínas Proto-Oncogênicas/genética
18.
Am J Hum Genet ; 109(4): 631-646, 2022 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-35290762

RESUMO

Studies of de novo mutation (DNM) have typically excluded some of the most repetitive and complex regions of the genome because these regions cannot be unambiguously mapped with short-read sequencing data. To better understand the genome-wide pattern of DNM, we generated long-read sequence data from an autism parent-child quad with an affected female where no pathogenic variant had been discovered in short-read Illumina sequence data. We deeply sequenced all four individuals by using three sequencing platforms (Illumina, Oxford Nanopore, and Pacific Biosciences) and three complementary technologies (Strand-seq, optical mapping, and 10X Genomics). Using long-read sequencing, we initially discovered and validated 171 DNMs across two children-a 20% increase in the number of de novo single-nucleotide variants (SNVs) and indels when compared to short-read callsets. The number of DNMs further increased by 5% when considering a more complete human reference (T2T-CHM13) because of the recovery of events in regions absent from GRCh38 (e.g., three DNMs in heterochromatic satellites). In total, we validated 195 de novo germline mutations and 23 potential post-zygotic mosaic mutations across both children; the overall true substitution rate based on this integrated callset is at least 1.41 × 10-8 substitutions per nucleotide per generation. We also identified six de novo insertions and deletions in tandem repeats, two of which represent structural variants. We demonstrate that long-read sequencing and assembly, especially when combined with a more complete reference genome, increases the number of DNMs by >25% compared to previous studies, providing a more complete catalog of DNM compared to short-read data alone.


Assuntos
Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Feminino , Humanos , Mutação/genética , Nucleotídeos , Análise de Sequência de DNA , Software
19.
Mol Biol Evol ; 38(12): 5576-5587, 2021 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-34464971

RESUMO

Human centromeres are mainly composed of alpha satellite DNA hierarchically organized as higher-order repeats (HORs). Alpha satellite dynamics is shown by sequence homogenization in centromeric arrays and by its transfer to other centromeric locations, for example, during the maturation of new centromeres. We identified during prenatal aneuploidy diagnosis by fluorescent in situ hybridization a de novo insertion of alpha satellite DNA from the centromere of chromosome 18 (D18Z1) into cytoband 15q26. Although bound by CENP-B, this locus did not acquire centromeric functionality as demonstrated by the lack of constriction and the absence of CENP-A binding. The insertion was associated with a 2.8-kbp deletion and likely occurred in the paternal germline. The site was enriched in long terminal repeats and located ∼10 Mbp from the location where a centromere was ancestrally seeded and became inactive in the common ancestor of humans and apes 20-25 million years ago. Long-read mapping to the T2T-CHM13 human genome assembly revealed that the insertion derives from a specific region of chromosome 18 centromeric 12-mer HOR array in which the monomer size follows a regular pattern. The rearrangement did not directly disrupt any gene or predicted regulatory element and did not alter the methylation status of the surrounding region, consistent with the absence of phenotypic consequences in the carrier. This case demonstrates a likely rare but new class of structural variation that we name "alpha satellite insertion." It also expands our knowledge on alphoid DNA dynamics and conveys the possibility that alphoid arrays can relocate near vestigial centromeric sites.


Assuntos
Centrômero , Proteínas Cromossômicas não Histona , Centrômero/genética , Centrômero/metabolismo , Proteína B de Centrômero/genética , Proteína B de Centrômero/metabolismo , Proteínas Cromossômicas não Histona/genética , DNA Satélite/genética , Humanos , Hibridização in Situ Fluorescente
20.
Am J Hum Genet ; 108(8): 1436-1449, 2021 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-34216551

RESUMO

Despite widespread clinical genetic testing, many individuals with suspected genetic conditions lack a precise diagnosis, limiting their opportunity to take advantage of state-of-the-art treatments. In some cases, testing reveals difficult-to-evaluate structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted. We performed targeted long-read sequencing (T-LRS) using adaptive sampling on the Oxford Nanopore platform on 40 individuals, 10 of whom lacked a complete molecular diagnosis. We computationally targeted up to 151 Mbp of sequence per individual and searched for pathogenic substitutions, structural variants, and methylation differences using a single data source. We detected all genomic aberrations-including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences-identified by prior clinical testing. In 8/8 individuals with complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, leading to changes in clinical management in one case. In ten individuals with suspected Mendelian conditions lacking a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in six and variants of uncertain significance in two others. T-LRS accurately identifies pathogenic structural variants, resolves complex rearrangements, and identifies Mendelian variants not detected by other technologies. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority genes and regions or complex clinical testing results.


Assuntos
Aberrações Cromossômicas , Análise Citogenética/métodos , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Predisposição Genética para Doença , Genoma Humano , Mutação , Variações do Número de Cópias de DNA , Feminino , Testes Genéticos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Cariotipagem , Masculino , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...